Overview

Dataset statistics

Number of variables4
Number of observations623
Missing cells583
Missing cells (%)23.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory19.6 KiB
Average record size in memory32.2 B

Variable types

Numeric4

Alerts

previousepisode_id is highly correlated with nextepisode_idHigh correlation
nextepisode_id is highly correlated with previousepisode_idHigh correlation
nextepisode_id has 583 (93.6%) missing values Missing
shows_id is uniformly distributed Uniform
shows_id has unique values Unique
episode_id has unique values Unique
previousepisode_id has unique values Unique

Reproduction

Analysis started2022-06-27 20:42:05.775387
Analysis finished2022-06-27 20:43:03.244076
Duration57.47 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

shows_id
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct623
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean311
Minimum0
Maximum622
Zeros1
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size5.0 KiB

Quantile statistics

Minimum0
5-th percentile31.1
Q1155.5
median311
Q3466.5
95-th percentile590.9
Maximum622
Range622
Interquartile range (IQR)311

Descriptive statistics

Standard deviation179.9888885
Coefficient of variation (CV)0.5787424069
Kurtosis-1.2
Mean311
Median Absolute Deviation (MAD)156
Skewness0
Sum193753
Variance32396
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.2%
4181
 
0.2%
4111
 
0.2%
4121
 
0.2%
4131
 
0.2%
4141
 
0.2%
4151
 
0.2%
4161
 
0.2%
4171
 
0.2%
4191
 
0.2%
Other values (613)613
98.4%
ValueCountFrequency (%)
01
0.2%
11
0.2%
21
0.2%
31
0.2%
41
0.2%
51
0.2%
61
0.2%
71
0.2%
81
0.2%
91
0.2%
ValueCountFrequency (%)
6221
0.2%
6211
0.2%
6201
0.2%
6191
0.2%
6181
0.2%
6171
0.2%
6161
0.2%
6151
0.2%
6141
0.2%
6131
0.2%

episode_id
Real number (ℝ≥0)

UNIQUE

Distinct623
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45929.2825
Minimum802
Maximum62764
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.0 KiB

Quantile statistics

Minimum802
5-th percentile15332.8
Q142833.5
median50918
Q352740
95-th percentile59396.2
Maximum62764
Range61962
Interquartile range (IQR)9906.5

Descriptive statistics

Standard deviation12728.45241
Coefficient of variation (CV)0.2771315317
Kurtosis2.605927473
Mean45929.2825
Median Absolute Deviation (MAD)3891
Skewness-1.730461975
Sum28613943
Variance162013500.7
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
416481
 
0.2%
494841
 
0.2%
548371
 
0.2%
623061
 
0.2%
234011
 
0.2%
497211
 
0.2%
604271
 
0.2%
411921
 
0.2%
438831
 
0.2%
450901
 
0.2%
Other values (613)613
98.4%
ValueCountFrequency (%)
8021
0.2%
15961
0.2%
18251
0.2%
22661
0.2%
25041
0.2%
28551
0.2%
37341
0.2%
40911
0.2%
60901
0.2%
60971
0.2%
ValueCountFrequency (%)
627641
0.2%
625451
0.2%
624181
0.2%
623061
0.2%
621271
0.2%
619091
0.2%
617551
0.2%
616741
0.2%
615561
0.2%
615361
0.2%

previousepisode_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct623
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2142028.751
Minimum1652175
Maximum2353919
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.0 KiB

Quantile statistics

Minimum1652175
5-th percentile1969230.8
Q11992892.5
median2134173
Q32289647.5
95-th percentile2347856
Maximum2353919
Range701744
Interquartile range (IQR)296755

Descriptive statistics

Standard deviation144940.6898
Coefficient of variation (CV)0.06766514677
Kurtosis-1.430340466
Mean2142028.751
Median Absolute Deviation (MAD)145276
Skewness0.05415212197
Sum1334483912
Variance2.100780357 × 1010
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19888621
 
0.2%
22388971
 
0.2%
20741931
 
0.2%
23450081
 
0.2%
20920411
 
0.2%
23083441
 
0.2%
23004461
 
0.2%
20457881
 
0.2%
23250951
 
0.2%
23537231
 
0.2%
Other values (613)613
98.4%
ValueCountFrequency (%)
16521751
0.2%
18304961
0.2%
19104561
0.2%
19432811
0.2%
19442151
0.2%
19442571
0.2%
19455921
0.2%
19477071
0.2%
19493361
0.2%
19496361
0.2%
ValueCountFrequency (%)
23539191
0.2%
23538001
0.2%
23537231
0.2%
23537201
0.2%
23535071
0.2%
23530741
0.2%
23530681
0.2%
23530581
0.2%
23530141
0.2%
23529991
0.2%

nextepisode_id
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct40
Distinct (%)100.0%
Missing583
Missing (%)93.6%
Infinite0
Infinite (%)0.0%
Mean2324558.35
Minimum2164172
Maximum2353605
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.0 KiB

Quantile statistics

Minimum2164172
5-th percentile2254771.45
Q12325088.25
median2336399
Q32346706.25
95-th percentile2353257.6
Maximum2353605
Range189433
Interquartile range (IQR)21618

Descriptive statistics

Standard deviation39671.71146
Coefficient of variation (CV)0.01706634357
Kurtosis8.062347074
Mean2324558.35
Median Absolute Deviation (MAD)11281.5
Skewness-2.740791524
Sum92982334
Variance1573844690
MonotonicityNot monotonic
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
23383531
 
0.2%
23132891
 
0.2%
22590311
 
0.2%
23239251
 
0.2%
23394761
 
0.2%
23418241
 
0.2%
23301761
 
0.2%
23509061
 
0.2%
23263931
 
0.2%
23286421
 
0.2%
Other values (30)30
 
4.8%
(Missing)583
93.6%
ValueCountFrequency (%)
21641721
0.2%
22015231
0.2%
22575741
0.2%
22590311
0.2%
22990851
0.2%
23094221
0.2%
23101021
0.2%
23132891
0.2%
23204591
0.2%
23239251
0.2%
ValueCountFrequency (%)
23536051
0.2%
23535541
0.2%
23532421
0.2%
23530701
0.2%
23526021
0.2%
23525851
0.2%
23513771
0.2%
23509061
0.2%
23496211
0.2%
23480391
0.2%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

shows_idepisode_idpreviousepisode_idnextepisode_id
00416481988862None
11521981986873None
22529332245512None
33513361964569None
445403323094212309422
55616742315117None
66520381973545None
77523731984264None
885501623367462336747
995733922990842299085

Last rows

shows_idepisode_idpreviousepisode_idnextepisode_id
613613552882301754None
614614443432132467None
615615193551998683None
616616326491949336None
617617341871996786None
618618518262050241None
619619528472001674None
620620535812046124None
621621593802234297None
622622394412194257None